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Virus  particle  24  (VP24)  is  the  smallest  protein  of  the  Ebola  and  Marburg  virus  genomes.  Recent  exper¬ 
iments  show  that  Ebola  VP24  blocks  binding  of  tyrosine-phosphorylated  STAT-1  homodimer  (PY-STAT1) 
to  the  NPI-1  subfamily  of  importin  alpha,  thereby  preventing  nuclear  accumulation  of  this  interferon-pro¬ 
moting  transcription  factor  which,  in  turn,  reduces  the  innate  immune  response  of  the  host  target.  Lack¬ 
ing  an  experimental  structure  for  VP24,  we  applied  de  novo  protein  structure  prediction  using  the 
fragment  assembly-based  Rosetta  method  to  classify  its  fold  topology  and  better  understand  its  biological 
function.  Filtering  and  ranking  of  models  were  performed  with  the  DFIRE  all-atom  statistical  potential 
and  the  CHARMM22  force  field  with  a  generalized  Born  solvent  model.  From  40,000  Rosetta-generated 
structures  and  selective  comparisons  with  the  SCOP  database,  a  structural  match  to  two  of  our  top  10- 
ranking  models  was  the  Armadillo  repeat  fold  topology.  Specific  members  of  this  fold  family  include 
importin  alpha,  importin  beta,  and  exportin.  We  propose  that,  unlike  the  nuclear  import  of  host  cargo, 
VP24  lacks  a  classical  nuclear  localization  signal  (NLS)  and  targets  importin  alpha  in  a  similar  manner 
to  the  observed  heterodimeric  complex  with  exportin,  thereby  interfering  with  the  auto-inhibitory  NLS 
on  importin  alpha  and  blocking  peripheral  docking  sites  for  PY-STAT1  assembly. 

Published  by  Elsevier  Inc. 


1.  Introduction 

Ebola  and  Marburg  hemorrhagic  fever  viruses,  sole  members  of 
the  Filoviridae  family,  produce  two  of  the  most  deadly  human  dis¬ 
eases  known.  Of  the  seven  proteins  encoded  in  the  viral  genome, 
virus  particle  24  protein  (commonly  referred  to  as  “VP24”)  is  the 
smallest  and  one  of  the  least  understood.  As  with  many  of  the  pro¬ 
teins  in  such  a  compactly-encoded  virus,  one  might  expect  VP24  to 
exhibit  more  than  one  function.  In  fact,  imaging  studies  of  VP24- 
transfected  cells  (Han  et  al.,  2003)  indicate  that  the  protein  local¬ 
izes  both  in  the  plasma  membrane  and  near  the  nucleus.  Roles  in 
its  membrane-bound  state  might  include  assistance  with  viral 
assembly  (Huang  et  al.,  2002),  budding  (Han  et  al.,  2003),  and  tran¬ 
scription  (Hoenen  et  al.,  2006).  In  its  soluble  state,  it  serves  as  a 
matrix  protein  (Han  et  al.,  2003).  Moreover,  experiments  have 
shown  that  VP24  along  with  proteins  VP35  and  NP  are  necessary 
and  sufficient  to  form  nucleocapsid-like  structures  in  vivo  (Huang 
et  al.,  2002).  Further  studies  concluded  that  VP24  is  necessary  for 
the  nucleocapsid  to  be  functional,  as  lack  of  VP24  in  infectious 
virus-like  particles  led  to  reduced  transcription  and/or  translation 
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of  another  Ebola  protein,  VP30  (Hoenen  et  al.,  2006).  Somewhat 
paradoxically,  VP24  also  has  been  found  to  reduce  transcription 
and  replication  of  the  virus  (Watanabe  et  al.,  2007). 

Han  et  al.  (2003)  characterized  different  in  vitro  aqueous  forms 
that  may  be  related  to  its  role  as  a  matrix  protein.  Specifically,  the 
molecular  weight  of  VP24  oligomers  was  determined  using  differ¬ 
ent  truncations  of  the  N-terminus.  The  wild-type  protein  in  vitro 
exists  in  both  monomeric  and  tetrameric  forms.  Removing  the  first 
40  residues  disrupted  tetramer  formation  and  led  to  non-specific 
aggregation,  suggesting  that  the  N-terminus  is  critical  for  ordered 
self-polymerization. 

A  recent  breakthrough  was  made  in  the  elucidation  of  one  of  the 
functions  of  VP24  in  the  cytoplasm  of  the  host.  Reid  et  al.  (2006) 
showed  how  Ebola  VP24  acts  in  suppressing  interferon  production 
in  host  cells.  VP24  binds  to  human  importin  a5,  a6,  and  a7  (kary- 
opherin  al,  a5,  and  a6,  respectively;  i.e.,  the  NPI-1  subfamily),  pre¬ 
venting  native  tyrosine-phosphorylated  STAT-1  homodimer  (PY- 
STAT1)  and  STAT-1 /STAT-2  heterodimer  from  translocating  into 
the  nucleus  (Reid  et  al.,  2006).  PY-STAT1  and  STAT-1 /STAT-2  het¬ 
erodimer  are  host-based  transcriptional  promoters  of  interferon 
a/ (3  and  y,  respectively. 

In  vitro  studies  suggest  that  VP24  competitively  binds  importin 
a5/a6/a7  displacing  PY-STAT1  (Reid  et  al.,  2006).  If  such  a  process 
were  taking  place,  it  is  expected  that  VP24  binds  to  the  same  re¬ 
gion  of  importin  a5/a6/a7  as  PY-STAT1,  or  it  locks  the  auto-inhib- 
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itory  NLS  domain  of  importin  a  into  the  NLS-binding  groove,  pre¬ 
venting  the  NLS  of  PY-STAT1  from  attaching.  Melen  et  al.  (2003) 
show  from  mutation  experiments  that  the  nuclear  localization  sig¬ 
nal  (NLS)  of  PY-STAT1  binds  to  the  C-terminal  region  of  importin 
a5.  While  it  is  not  considered  a  traditional  NLS-binding  region, 
the  structure  of  C-terminal-bound  cargo  was  recently  determined 
for  an  influenza  PB2-importin  a5  complex  (PDB  ID:  2JDQJ  (Taren- 
deau  et  al.,  2007). 

Our  study  focused  on  predicting  the  structural  topology  of  the 
VP24  monomer  and  its  fold  classification  using  bioinformatic  and 
de  novo  approaches.  Our  predictions  were  evaluated  in  light  of  re¬ 
cent  experimental  observations  of  Ebola  VP24  interfering  with  the 
nuclear  import  process.  In  addition,  to  further  validate  our  de  novo 
fold  recognition  protocol  beyond  studies  performed  previously 
(Bonneau  et  al.,  2002),  we  performed  fold  prediction  on  another 
protein  in  the  Ebola  genome,  VP30,  whereby  the  structure  of  the 
C-terminal  domain  has  been  experimentally  determined  (Hartlieb 
et  al.,  2007). 

2.  Materials  and  methods 

The  conventional  approach  to  structure  prediction  is  to  equate 
sequence  stretches  of  a  query  protein  with  known  protein  struc¬ 
tures  using  such  tools  as  PSI-BLAST  (Altschul  et  al.,  1997).  If  this 
procedure  fails,  fold  recognition/threading  methods  are  used  to  de¬ 
duce  coarser-grained  similarities  of  the  estimated  sequence  char¬ 
acteristics  with  a  database  of  known  fold  topologies.  If  both  of 
these  methods  lead  to  inadequate  confidence  scores  and/or  a  lack 
of  consensus,  computationally  intensive  de  novo  methods,  such 
as  fragment  assembly  (Simons  et  al.,  1997),  can  be  applied  as  a  last 
resort.  We  performed  all  three  types  of  procedures  in  this  work. 

First,  we  compared  the  sequences  of  Zaire  Ebola  VP24  (denoted 
here  as  Ebo-VP24;  GenBank  accession  no.  AAD14588;  251  aa)  and 
Marburg  VP24  (Mar-VP24;  GenBank  accession  no.  AAQ55260.1; 
253  aa)  against  the  non-redundant  (NR)  NCBI  database  (Benson 
et  al.,  2007)  using  PSI-BLAST  (Altschul  et  al.,  1997)  and  against 
the  UNIPROT  sequence  database  (Apweiler  et  al.,  2004)  using  the 
Smith-Waterman  algorithm  (SSEARCH34)  (Pearson,  1991;  Smith 
and  Waterman,  1981).  For  reference,  the  two  VP24  sequences  are 
pairwise  aligned  in  Fig.  1.  Next,  we  submitted  both  sequences  to 
the  Bioinfo.pl  metaserver  (Bujnicki  et  al.,  2001)  which  invokes  sev¬ 
eral  fold  recognition  servers  including  Basic,  MetaBasic  (Ginalski 
et  al.,  2004),  3D-PSSM  (Kelley  et  al.,  2000),  Orfeus  (Ginalski  et  al., 
2003b),  FFAS03  (Jaroszewski  et  al.,  2005),  and  the  consensus  algo¬ 
rithm,  3D-Jury  (Ginalski  et  al.,  2003a).  In  the  third  approach,  we  ap¬ 
plied  a  de  novo  structure  prediction  strategy  using  the 


RosettaAblnitio  program  (version  1.1)  (Simons  et  al.,  1997),  which 
has  been  successful  in  finding  remote  fold  homologies  when  se¬ 
quence-based  methods  fail  (Bonneau  et  al.,  2004).  We  used  the  pro¬ 
gram,  DISpro  v.  1.0  (Sickmeier  et  al.,  2007),  to  identify  potentially 
disordered  regions  in  order  to  reduce  the  stretch  of  residues  that 
were  input  to  Rosetta. 

Before  assembling  models  with  Rosetta,  we  constructed  3-  and  9- 
residue  fragment  libraries  using  make  Jr agments.pl  (Simons  et  al., 
1997).  Only  one  secondary  structure  prediction  program,  PSIPRED 
v.  2.4  (Jones,  1999),  was  used  in  this  process.  The  server-based 
SAM-T06  secondary  structure  predictor  (Karplus  et  al.,  2005)  pro¬ 
duced  very  similar  results.  The  fragment  database  “vall-2001"  was 
used.  PSI-BLAST  (Altschul  et  al.,  1997)  invoked  by  the  fragment  li¬ 
brary  program  discovered  a  weak  sequence  similarity  (e-value  =  0.4) 
to  both  Ebo-VP24  and  Mar-VP24  in  the  fragment  database:  mouse 
importin  a  (PDB:  UAL).  As  a  result,  we  built  two  fragment  libraries 
for  each  sequence,  one  with  UAL  and  one  without. 

Twenty  thousand  backbone-only  structures  were  generated  by 
Rosetta  for  each  sequence.  Ten  thousand  of  the  models  were  gener¬ 
ated  using  the  1  IAL-included  fragment  library,  another  set  of  1 0,000 
structures  were  generated  with  the  fragment  library  that  excluded 
UAL.  The  SCWRL  side-chain  generation  program  (Canutescu  et  al., 
2003)  was  used  to  generate  all-atom  models  for  each  Rosetta-built 
backbone  model.  To  improve  conformational  diversity  (Bradley 
et  al.,  2005),  another  20,000  models  for  each  sequence  were  gener¬ 
ated  by  substituting  the  side  chains  of  the  models  built  from  one  se¬ 
quence  with  the  side  chains  of  the  other  sequence.  In  total,  40,000 
all-atom  models  were  generated  for  each  VP24  sequence. 

Each  of  the  Rosetta  models  were  evaluated  with  the  DFIRE- 
based  all-atom  (DFIRE-AA)  statistical  potential  using  an  in-house 
implementation  (Zhang  et  al.,  2004;  Zhou  and  Zhou,  2002).  While 
Rosetta  has  a  built-in  all-atom  scoring  function,  DFIRE-AA  is  less 
sensitive  to  steric  clashes  and  thus  can  be  used  on  non-minimized 
models.  The  top-100  scoring  structures  for  each  batch  of  10,000 
models  were  minimized  with  50  steps  of  steepest  descent  followed 
by  100  steps  of  adopted-basis  Newton-Raphson  using  the  min- 
CHARMM.pl  (Feig  et  al.,  2004)  which  invokes  CHARMM  (Brooks 
et  al.,  1983).  The  energy  function  used  for  minimization  was  the 
PARAM22  force  field  (Mackerell  et  al.,  1998)  with  a  4r-dielectric 
electrostatic  function.  Next,  the  PARAM22  plus  generalized  Born 
molecular  volume  solvation  (GBMV2)  (Lee  et  al.,  2003)  energy 
(including  a  15  cal/mol/A2  nonpolar  surface  area  term)  was  evalu¬ 
ated  for  each  batch  of  100  minimized  structures. 

Early  papers  describing  Rosetta  methodology  employed  cluster¬ 
ing  of  generated  models  to  extract  out  representative  models  (Tsai 
et  al.,  2003).  Our  studies  (Lee  and  Olson,  2007)  and  more  recent 
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Fig.  1.  Sequence  alignment  of  Ebo-VP24  and  Mar-VP24  rendered  with  Jalview  2.4  (Waterhouse  et  al.,  2009).  The  highlighted  residues  are  conserved  between  the  two  species. 
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Table  1 

Notable  high-scoring  models  from  the  Bioinfo.pl  metaserver  for  VP24  (Bujnicki  et  al.,  2001). 


Annotation 

PDB  ID 
(chain) 

SCOP 

code 

Method 

Scoring  method 

Score 

VP24 

strain3 

Sequence 

rangeb 

Chaperonin  60 

1  iok(  A) 

a.129.1.1 

Basic(dist)  (Ginalski  et  al., 
2004) 

3D-Jury  (Ginalski  et  al., 
2003a) 

39.00/ 

37.67 

E/M 

23-200 

DNA  repair  protein 

lee8(A) 

a.156.1.2 

Basic(dist) 

3D-Jury 

25.14/28.5 

E/M 

66-163 

Cytochrome  b6f 

lq90(D) 

f.32.1 .1 c 

FFAS03  (Jaroszewski  et  al., 
2005) 

3D-Jury 

14.43/16.5 

E/M 

54-191 

Nuclear  cap  binding 
protein 

ln52(A) 

a.118.1.14 

Basic(dist) 

3D-Jury 

14.29 

E 

53-162 

IRF-3 

lj2f(A) 

b.26.1.3d 

FFAS03 

FFAS03 

-6.8/-6.6 

E/M 

15-164 

a  E  =  Ebola  VP24  sequence;  M  =  Marburg  VP24. 
b  VP24  sequence  range  that  was  aligned  to  target  sequence. 
c  Transmembrane  helix  topology. 
d  All-beta  topology. 


work  in  the  Baker  lab  (Bradley  et  alM  2005;  Misura  and  Baker,  2005) 
suggest  that  it  is  better  to  attempt  to  detect  the  rare  models  that 
achieve  the  lowest  scores  rather  than  clustering.  The  rationale  is 
that  the  united-residue  energy  function  used  in  the  de  novo  frag¬ 
ment  assembly  phase  is  lower  resolution  and  may  not  significantly 
populate  near-native  topologies.  Instead,  if  a  rare  near-native  mod¬ 
el  is  generated,  the  expectation  is  that  an  all-atom  scoring  function 
will  select  it  out  (Bradley  et  al.,  2005;  Lee  and  Olson,  2007). 

Finally,  the  five  top  scoring  structures  from  all  of  the  batches  for 
each  sequence  were  queried  in  a  structural  similarity  search 
against  the  95%  homologous  ASTRAL  subset  (Chandonia  et  al., 
2004)  of  the  SCOP  1.71 -fold  database  (Andreeva  et  al.,  2004;  Mur- 
zin  et  al.,  1995)  using  the  Combinatorial  Extension  (CE)  program 
(Shindyalov  and  Bourne,  1998).  The  CE  Z-score  is  an  aggregate  mea¬ 
sure  of  the  RMSD,  length,  and  gaps  of  the  optimal  alignment  be¬ 
tween  query  and  template.  Z-score  values  above  4.5  are 
indicative  of  fold  similarity;  values  above  5  are  more  compelling. 
The  CE  program  was  slightly  modified  for  this  work  to  generate 
continuous  Z-scores  from  the  continuous  probability,  P: 

Z-score  =  1.26098^-310^0?-  0.445628, 

derived  by  a  fit  to  the  tabular  data  in  the  source  code. 

We  have  already  benchmarked  our  post-processing  scoring 
functions  on  Rosetta-generated  models  with  less  than  100  residues 
(Lee  and  Olson,  2007).  Furthermore,  others  have  shown  that  it  is 
possible  to  deduce  fold  type  from  de  novo  predictions  (Bonneau 
et  al.,  2004;  Bonneau  et  al.,  2002).  Nonetheless,  we  include  in  this 
work  a  validation  of  our  de  novo  fold  recognition  protocol  on  the  C- 
terminal  domain  (residues  140-266)  of  Ebola  VP30  which  has  al¬ 
ready  been  structurally  characterized  (PDB  ID:  2I8B)  (Hartlieb 
et  al.,  2007).  The  C-terminus  of  VP30  has  a  unique  fold  amongst 
known  folds  and,  like  VP24,  is  dissimilar  to  any  other  known  se¬ 
quence  outside  of  filoviruses.  The  specific  sequences  evaluated  in 
this  work  were  GenBank  accession  no.  AAD14587  (“Ebo-VP30”; 
residues  140-266)  and  GenBank  accession  no.  AAQ55259  (“Mar- 
VP30”;  residues  147-274).  There  were  two  differences  in  our  de 
novo  prediction  of  this  protein  vs.  VP24.  First,  we  did  not  substitute 
the  side  chains  of  the  Ebola  sequence  in  the  Marburg  models  and 
vice  versa.  Thus,  only  20,000  models  per  sequence  were  generated. 
Second,  because  there  were  no  homologous  fragments  detected, 
the  10,000-model  runs  were  simply  performed  in  duplicate. 

3.  Results 

Prior  to  obtaining  fold  recognition  and  de  novo  predictions,  sev¬ 
eral  analyses  on  the  primary  structures  and  sequences  were  con¬ 
ducted.  We  initially  considered  possible  unstructured  regions  in 
VP24.  The  first  20  residues  of  the  N-terminal  and  the  last  50  C-ter¬ 


minal  residues  for  both  Ebola  and  Marburg  VP24  sequences  were 
predicted  by  the  DISpro  program  (Sickmeier  et  al.,  2007)  to  have 
at  least  some  structural  disorder.  The  N-terminus  of  VP24  is  known 
to  induce  tetramers  as  opposed  to  high  molecular-weight  aggre¬ 
gates  (Han  et  al.,  2003).  Conceivably,  the  N-terminus  region  be¬ 
comes  more  structured  as  part  of  the  tetrameric  binding  interface. 

We  next  examined  secondary  structure  predictions  for  the  en¬ 
tire  Ebo-VP24  and  Mar-VP24  sequences  using  PSIPRED  version 
2.45  (Jones,  1999).  The  proteins  are  nearly  equivalent  in  this  anal¬ 
ysis  as  the  position-specific  substitution  matrices  are  each  com¬ 
posed  of  the  same  set  of  filoviral  proteins.  Residues  50-175 
appear  to  be  an  all-a  arrangement,  while  residues  175-203  form 
a  contiguous  multi-stranded  p-sheet. 

Optimal  pairwise  sequence  alignments  (SSEARCH)  with  the 
UNIPROT  sequence  database  (Apweiler  et  al.,  2004)  and  PSI-BLAST 
alignments  (Altschul  et  al.,  1997)  with  the  NR  protein  database 
(Benson  et  al.,  2007)  indicate  that  both  Ebo-VP24  and  Mar-VP24 
have  no  clear  sequence  homologues  to  any  other  known  viral,  pro¬ 
karyotic,  and  eukaryotic  sequences  available  so  far.  In  contrast,  the 
VP24  sequences  from  the  two  species  are  35%  identical  and  repre¬ 
sent  the  most  conserved  viral  proteins  within  the  Filoviridae  family. 

3.1.  Fold  recognition  servers 

Similar  to  the  sequence  searches,  fold  recognition  programs 
instantiated  from  the  Bioinfo  (http://bioinfo.pl)  metaserver  for 
VP24  provided  relatively  low  confidence  model  predictions.  In  Ta¬ 
ble  1,  the  most  notable  and  representative  structural  neighbors 
predicted  include  chaperonins,  DNA  repair  proteins,  cytochromes, 
nuclear  cap  binding  protein  and  IRF-3.  From  the  SCOP  database, 
the  corresponding  detected  fold  types  were  the  following:  multih¬ 
elical,  consisting  of  8  helices  arranged  in  two  parallel  layers;  3-4 
helices;  membrane  and  cell-surface  proteins  with  three  transmem¬ 
brane  helices  forming  an  up-and-down  bundle;  oe-oe  superhelix; 
and  all-p  sandwich.  As  discussed  further  below,  the  prevalence  of 
protein  classification  was  an  all-a  bundle  with  non-orthogonal 
helices. 

3.2.  De  novo  predictions 

Lacking  clear  consensus  from  the  sequence-based  fold  recogni¬ 
tion  servers,  we  applied  a  de  novo  structure  approach  to  find  re¬ 
mote  fold  homologs  (Bonneau  et  al.,  2002,  2004).  Since  the 
Rosetta  folding  program  is  indicated  for  protein  segments  of  less 
than  200  residues  (Chivian  et  al.,  2003),  we  limited  our  study  to 
a  151  residue  stretch.  Using  the  most  conservative  measure  from 
DISpro,  residues  21-201  had  no  predicted  disorder.  In  addition, 
as  seen  in  Fig.  1,  the  sequence  region  50-200,  which  we  chose  to 
model  via  de  novo  fragment  assembly,  is  moderately  conserved 
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Fig.  2.  De  novo  fragment  assembly  of  Ebo-VP30.  (a)  DFIRE-AA  energies  vs.  pairwise  CE  Z-scores  to  native  VP30  C-terminal  domain  (PDB  ID:  2I8B)  (Hartlieb  et  al.,  2007)  for  all 
Rosetta  models  generated  in  the  2nd  run.  (b)  All-atom  force  field  energies  vs.  CE  Z-score  for  all  Rosetta  models  generated  in  the  2nd  run.  (c)  All-atom  force  field  energies  vs.  CE 
Z-score  of  the  200  models  collected  from  the  top-100  DFIRE-AA  structures  from  each  run.  (d)  Graphical  representation  of  the  most  native-like  Ebo-VP30  model  (blue)  from 
the  list  of  the  top-5  scoring  models  superimposed  on  the  experimental  structure  (green)  (Ca  RMSD  =  4.5  A  for  residues  140-250). 


(45%  similarity)  between  the  sequences  of  Ebo-VP24  and  Mar- 
VP24  with  no  alignment  gaps.  As  a  confirmation  of  our  region  of  fo¬ 
cus,  de  novo  predictions  based  on  regions  1-150,  100-251  (Ebo- 
VP24),  and  100-253  (Mar-VP24)  yielded  little  consensus  among 
ascertained  fold  types  (results  not  shown). 

To  select  protein  models  from  Rosetta  for  structural  compari¬ 
sons,  we  applied  a  multi-resolution  scoring  and  filtering  ap¬ 
proach.  Since  our  model  scoring  approach  deviates  from  the 
Rosetta  all-atom  refinement  protocol  (Bradley  et  al.,  2005;  Lee 
and  Olson,  2007),  we  first  illustrate  the  procedure  with  the  se¬ 
quence  stretch  of  Ebo-VP30  which  has  a  known  structure  in 
Fig.  2  and  Table  2.  In  Fig.  la,  the  DFIRE-AA  energy  has  a  moderate 
scoring  funnel  compared  to  the  CE  structural  similarity  measure. 
On  the  other  hand,  Fig.  2b  shows  that  the  all-atom  CHARMM22/ 
GBMV2  energy  does  not  have  a  scoring  funnel  with  the  models 


generated,  albeit  the  native  structure  is  the  lowest  scoring  by  a 
significant  margin.  Using  DFIRE-AA  as  a  filter,  by  retaining  only 
the  two  top-100  DFIRE-AA  subsets,  the  CHARMM22/GBMV2  en¬ 
ergy  selects  out  one  model  in  the  top-5  with  a  CE  Z-score  of  5.4 
compared  to  the  native  structure.  This  model  is  superimposed 
on  the  experimental  structure  in  Fig.  2d.  The  Ca  RMSD  between 
the  two  structures  is  4.5  A  in  the  region  140-250.  The  C-terminal 
segment  (251-270)  is  not  compacted  to  the  rest  of  the  structure 
because  it  is  bound  to  an  identical  chain  in  the  experimentally- 
determined  dimer  (Hartlieb  et  al.,  2007).  Finally,  Table  2  summa¬ 
rizes  the  structural  similarity  searches  of  the  top-ranking  Rosetta 
models  against  the  SCOP  database  of  known  folds.  The  search 
consensus  between  Ebo-VP30  and  Mar-VP30  correctly  selects 
out  the  VP30  fold.  While  not  an  entry  in  SCOP  1.71,  the  VP30  fold 
is  unique  (Hartlieb  et  al.,  2007). 


Table  2 

Top  SCOP  fold  type  matches  (as  measured  by  CE  structural  similarity  Z-score)  for  the  top  five  scoring  Ebo-VP30  and  Mar-VP30  Rosetta  protein  structure  models. 
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Fig.  3.  (a)  Distribution  of  DFIRE-AA  energies  for  a  single  batch  of  10,000  all-atom 
Rosetta  models  of  Ebo-VP24  (residues  50-200).  The  shaded  bars  indicate  the  100 
top-scoring  models  selected  for  further  minimization  and  scoring  with  CHARMM/ 
GB.  (b)  Distribution  of  CHARMM/GB  scores  for  the  conglomerated  400  minimized 
models  of  Ebo-VP24  (residues  50-200).  The  shaded  bars,  indicate  the  five  top¬ 
scoring  models  which  were  submitted  to  the  CE  structural  similarity  search. 


Proceeding  to  the  blind  prediction  of  VP24,  shown  in  Fig.  3  are 
the  energy  distributions  for  the  structures  generated  for  Ebo-VP24. 
Fig.  3a  illustrates  DFIRE-AA  energies  for  the  10,000  Rosetta  models 
of  Ebo-VP24  from  which  100  high-ranking  structures  (shaded  bars) 
were  selected  for  further  scoring.  A  total  of  40,000  all-atom  models 
for  each  sequence  were  generated  and  a  total  of  400  structures 
were  selected  from  each  of  four  10,000-model  sets  (see  Methods). 
Fig.  3b  illustrates  the  application  of  the  CHARMM22/GBMV2  meth¬ 
od  to  select  the  top  five  models  (shaded  bars)  out  of  the  400  used  in 
the  CE  structure-structure  alignment  similarity  search. 

Listed  in  Table  3  are  the  top  scoring  hits  for  the  VP24  proteins 
with  Z-scores  >  4.5  from  the  CE  analysis.  Because  of  the  35%  se¬ 
quence  identity  between  Ebo-VP24  and  Mar-VP24,  we  once  again 
have  taken  a  consensus  approach  of  ranking  the  structural  neigh¬ 
bors  of  the  five  selected  Rosetta  models.  The  results  show  the  top 
hit  was  the  Armadillo  repeat  fold  family,  which  includes  proteins 


importin  a,  importin  p,  exportin  and  p-catenin.  As  noted  above, 
importin  a  has  been  shown  to  be  a  host  target  for  Ebo-VP24. 

Illustrated  in  Fig.  4  are  three  of  the  top-ranking  structural  mod¬ 
els  from  Rosetta  and  their  3-D  protein  topology  diagrams  (West- 
head  et  al.,  1998).  Regions  of  the  Rosetta  models  that  structurally 
align  with  known  protein  structures  are  colored  green  and  regions 
lacking  alignments  are  colored  blue  for  the  N-terminal  and  red  for 
the  p-strand  regions.  For  each  of  the  Rosetta  models,  the  depicted 
topology  cartoons  reflect  the  diversity  among  the  three  predicted 
folds  in  packing  helices  (colored  green)  and  p-strands  (red). 


4.  Discussion 

Our  prediction  protocols  were  not  able  to  obtain  a  consensus 
structural  model  of  the  Ebo-VP24  and  Mar-VP24  proteins  based 
entirely  on  sequence  homology  or  fold  recognition.  Of  the  latter, 
four  of  the  five  top-scoring  SCOP  classifications  were  all-a  proteins, 
including  a  transmembrane  helical  up-and-down  bundle  structure 
which  is  consistent  with  the  viral  matrix  nature  of  VP24.  The 
remaining  predicted  model  was  an  all-p  fold.  Although  there  was 
strong  consensus  of  predicting  all-a  proteins,  many  different  folds 
were  recognized  of  this  classification.  To  help  place  this  prediction 
in  its  proper  perspective,  there  are  nearly  1100  unique  protein 
folds  classified  in  SCOP  and  of  this  total  nearly  260  are  all-a  helical 
proteins  (Andreeva  et  al.,  2004). 

In  an  attempt  to  improve  the  structural  classification  of  the 
Ebo-VP24  and  Mar-VP24  proteins  beyond  sequence  and  fold 
threading,  we  applied  de  novo  predictions  using  the  Rosetta  frag¬ 
ment  assembly  method.  A  multi-resolution  scoring  approach  was 
then  applied  to  filter  and  select  protein  models  for  structural 
comparisons  with  known  folds  of  the  SCOP  library.  The  applica¬ 
tion  of  the  DFIRE-AA  potential  to  the  Rosetta-generated  struc¬ 
tures  yielded  a  distribution  of  energies  that  presumably 
favored  a  small  number  of  native  or  near-native  structures  from 
decoys.  The  top-ranking  DFIRE-AA  structures  were  then  mini¬ 
mized  slightly  to  relieve  steric  clashes.  More  extensive  refine¬ 
ment,  using  for  example  the  Rosetta  model  relaxation  module, 
can  be  very  constructive  when  at  least  some  of  the  models  are 
expected  to  be  very  close  to  the  native  structure  (Bradley 
et  al.,  2005).  However,  extensive  refinement  is  significantly  more 
costly  than  simple  minimization,  and,  perhaps,  needlessly  expen¬ 
sive  for  the  lower-resolution  fold  prediction  sought  in  this  work. 
The  CHARMM22/GBMV  scoring  function  differentiates  among  the 
top-ranking  DFIRE  structures  and  provides  detection  of  protein 
models  that  exhibit  higher-resolution  local  geometric  properties 
(e.g.,  all-atom  side-chain  packing,  electrostatic  solvent  effects, 
etc.).  The  optimal  goal  of  any  scoring  function  is  to  assess  se¬ 
quence  fitness  for  a  given  protein  fold  and  our  combined  ap¬ 
proach  narrowed  the  selection  down  to  five  protein  models 
from  a  total  of  40,000  structures  for  each  sequence. 

Other  researchers  have  shown  that  top-scoring  Rosetta-gener¬ 
ated  models  can  indeed  be  used  to  identify  fold  type  through  struc¬ 
tural  similarity  (Bonneau  et  al.,  2002,  2004).  However,  since  our 
post-processing  algorithm  is  technically  different,  we  first  pre¬ 
sented  results  for  another  Ebola  protein,  VP30,  to  validate  our  de 


Table  3 

Top  SCOP  fold  type  matches  (as  measured  by  CE  structural  similarity  Z-score)  for  the  top  five  scoring  Ebo-VP24  and  Mar-VP24  Rosetta  protein  structure  models. 
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novo  protocol  with  a  known  protein  structure  in  the  same  genome 
with  a  similar  number  of  residues  (N=  127)  (Bradley  et  al.,  2005; 
Lee  and  Olson,  2007).  What  we  see  from  this  experiment  is  that 
DFIRE-AA  acts  as  a  low  resolution  filter  to  cull  out  models  moder¬ 
ately  similar  to  the  native.  While  it  has  been  shown  elsewhere  that 
CHARMM22/GBMV2  energy  can  perform  a  higher  resolution  detec¬ 
tion  of  near-native  models  than  DFIRE-AA  (Lee  and  Olson,  2007),  it 
appears  in  this  case  that  CHARMM22/GBMV2  did  not  enhance 


Fig.  4.  Three  of  the  top-scoring  VP24  structural  models  (residues  50-200)  and  their 
corresponding  TOPS  (Westhead  et  al.,  1998)  topology  diagrams  on  the  right:  (a) 
Armadillo  repeat  fold  (a-helical  segments:  52-59,  68-79,  85-101,  104-130,  137- 
172;  (3-sheet  segments:  177-183,  187-193,  196-199);  (b)  superantigen  MAM  fold 
(a-helical  segments:  52-62,  69-81,  84-87,  89-98,  104-110,  112-125,  135-158, 
160-173;  (3-sheet  segments:  177-180,  187-190;  310-helical  segment:  195-197); 
and  (c)  alpha-catenin/vinculin  fold  (a-helical  segments:  55-62,  66-73,  76-81,  85- 
88,  94-100,  105-127,  131-141,147-160,161-173;  (3-sheet  segments:  177-182, 
187-192,  197-199).  Segments  labeled  by  helical  content  (green),  sheet  content 
(red),  and  region  not  aligned  with  structural  homologue  (blue). 


detection,  presumably  because  the  Rosetta  models  were  not  close 
enough  to  the  native.  Encouragingly,  we  were  able  to  correctly 
identity  of  the  fold  type  of  VP30  (Table  2).  However,  had  the  exper¬ 
imentally-determined  VP30  fold  not  been  present,  we  would  have 
had  difficulty  classifying  the  sequence  as  having  a  new  fold. 

Comparisons  of  our  selected  models  from  Rosetta  for  VP24  with 
structures  from  SCOP  revealed  several  interesting  results.  Using  a 
consensus  approach  of  treating  the  Ebola  and  Marburg  proteins 
as  evolutionarily  conserved  topologies,  the  top-scoring  models 
for  residues  50-200  were  observed  to  populate  two  major  fold 
clusters:  a-a  superhelix  (rank  1  and  2)  and  four-helical  up-and- 
down  bundle  (rank  3  and  4).  There  are  multiple  fold  topologies 
for  the  a-a  superhelix  lineage  and  the  SCOP  database  contains 
23  fold  superfamilies  for  this  cluster.  For  the  four-helical  up-and- 
down  bundle,  there  are  27  superfamilies.  While  the  average 


Fig.  5.  X-ray  crystal  structure  (PDB  ID:  1WA5)  (Matsuura  and  Stewart,  2004)  of 
importin  a  (red,  orange,  and  yellow  molecular  surfaces  (Sanner  et  al.,  1996))  and  its 
auto-NLS  (dark  blue  tube)  bound  to  fragments  of  exportin  (light  blue  ribbons 
[residues  84-190]  and  green  ribbons  [residues  211-319]):  (a)  front  view;  (b)  side 
view.  The  Armadillo-fold  VP24  model  (silver  ribbons)  in  (a)  structurally  overlaps 
the  first  exportin  fragment  with  high  confidence  (Z-score  =  5.2)  and  overlaps  the 
second  exportin  fragment  with  moderate  confidence  (Z-score  =  4.2).  Orange- 
colored  region  of  importin  a  was  experimentally  identified  to  be  minimally 
necessary  for  VP24  binding  (Reid  et  al.,  2007).  The  combined  yellow  and  orange 
regions  indicate  the  experimentally  characterized  PY-STAT1  binding  region  (Melen 
et  al.,  2003;  Reid  et  al.,  2007). 
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Fig.  6.  Sequence  and  secondary  structure  comparison  of  residues  51-170  of  Armadillo-fold  VP24  model  with  residues  85-192  of  exportin  Cselp  (PDB  ID:  1WA5)  using  the 
structural  alignment  obtained  by  CE  (Z-score  =  5.2,  RMSD  =  5.2  A,  sequence  identity  =  7.4%).  The  secondary  structure  assignments  were  generated  with  DSSP  (Kabsch  and 
Sander,  1983).  DSSP  legend:  H  -  alpha  helix,  G  -  3i0  helix,  T  -  hydrogen-bonded  turn,  S  -  bend,  and  C  -  loop  (no  ordered  structure). 


Z-scores  separating  the  structural  models  are  less  than  0.5  (Table 
2),  individual  Z-scores  of  approximately  5  or  greater  are  very  ro¬ 
bust  in  detecting  structural  neighbors.  Moreover,  the  set  of  scores 
computed  among  the  structural  alignments  for  each  protein  model 
showed  clear  discrimination  of  rank  1  thru  4  within  a  fold  space  of 
50  helical  superfamilies. 

The  rank  1  of  Armadillo  repeat  family  is  the  most  intriguing  re¬ 
sult  for  the  classification  of  VP24.  What  is  convincing  about  this  hit 
is  that  this  fold  family  contains  importin  a  and  p,  plus  exportin.  As 
reported  from  experimental  studies,  Ebo-VP24  targets  importin  a 
and  inhibits  intracellular  signaling  of  the  interferon  pathway.  Spe¬ 
cifically,  VP24  interferes  with  the  association  of  PY-STAT1  to  the  C- 
terminus  of  importin  oc5/a6/oe7.  We  predict  that  there  is  no  fold 
similarity  between  VP24  and  PY-STAT1,  but  rather  that  the  viral 
protein  is  a  distant  structural  homolog  of  molecules  that  partici¬ 
pate  in  transporting  cargo  across  the  nuclear  envelope. 

Essential  to  deriving  a  model  of  VP24  are  experimental  observa¬ 
tions  of  how  importin  at  recognizes  and  binds  other  proteins, 
including  members  of  the  same  superfamily.  The  most  common 
way  that  proteins  bind  importin  a  is  through  the  use  of  nuclear 
localization  signals  (NLS),  which  are  short  stretches  of  residues 
with  multiple  basic  amino  acids  (i.e.,  lysine  and  arginine).  Indeed, 
Ebo-VP24  contains  a  curious  segment  of  residues,  13-PKKDLEK- 
19,  which  is  almost  completely  conserved  among  its  sequence  vari¬ 
ants  (results  not  shown).  However,  this  region  does  not  produce 
any  matches  with  the  NLSdb  (Nair  et  al.,  2003),  which  is  a  database 
of  known  NLS  motifs.  Furthermore,  this  stretch  of  residues  is  com¬ 
pletely  absent  in  Mar-VP24.  If  this  stretch  of  residues  in  Ebo-VP24 
were  binding  like  an  NLS,  and  this  was  its  sole  mode  of  binding  to 
importin  a,  then  it  would  not  be  possible  to  explain  how  VP24 
prevents  PY-STAT1  and  STAT-l/STAT-2  heterodimer  from  translo¬ 
cating  into  the  nucleus,  given  that  NLS  sequences,  which  are  quite 
common  in  the  proteome  of  a  cell,  are  not  known  to  inhibit  the 
nuclear  translocation  process.  Therefore,  we  can  eliminate  the  pos¬ 
sibility  that  VP24  uses  only  an  NLS  to  form  the  association 
complex. 

Alternatively,  VP24  must  bind  importin  a  independent  of  NLS 
binding.  Experimental  models  of  protein-protein  complex  forma¬ 
tion  of  importin  ot  that  are  independent  of  NLS  binding  are 
homodimerization  (e.g.,  PDB  ID  2JDQJ  (Tarendeau  et  al.,  2007) 
and  the  association  with  exportin  in  the  yeast  proteome  (PDB 
ID:  1WA5)  (Matsuura  and  Stewart,  2004).  Although  homodimer¬ 
ization  of  importin  a  is  observed  under  crystallographic  condi¬ 
tions,  no  experimental  data  exist  showing  any  biological 
significance  and  thus  this  model  of  binding  VP24  can  be  presum¬ 
ably  ruled  out.  In  contrast,  the  binding  of  exportin  recycles 
importin  a  from  the  nucleus  back  to  the  cytoplasm.  Based  on 
the  crystallographic  structure  of  importin  a  bound  with  exportin 
and  using  structure-structure  alignments  of  our  highest-ranked 
VP24  structure  with  exportin,  we  propose  a  model  illustrated 
in  Fig.  5.  The  docking  sites  for  VP24  are  located  where  the  struc¬ 
tural  alignments  with  the  ARM  repeats  of  exportin  showed  the 


greatest  interfacial  contacts  with  importin  a.  Our  model  predicts 
that  one  or  more  monomers  of  VP24  bind  importin  a  with  struc¬ 
tural  complementarity  to  that  observed  with  exportin  (Fig.  5).  In 
fact,  the  binding  interface  of  exportin-importin  complex  overlaps 
with  ARM  repeat  10  (residues  458-504),  which  is  the  region  on 
importin  a  experimentally  known  to  be  required  for  VP24  bind¬ 
ing  (Reid  et  al.,  2007). 

There  are  two  consequences  of  this  VP24-importin  a  structural 
model.  First,  the  docked  VP24  structures  may  hinder  the  release  of 
the  auto-inhibitory  NLS  of  importin  ot,  which  can  prevent  NLS’s 
from  other  proteins  such  as  PY-STAT1  from  docking  into  the  NLS- 
binding  groove.  Subsequently,  the  N-terminus  of  importin  a  locked 
in  its  inhibitory  state  would  be  unable  to  bind  importin  p  (Cingo- 
lani  et  al.,  1999)  and  subsequently  transport  cargo  such  as  PY- 
STAT1 .  Second,  placement  of  VP24  at  the  predicted  docking  sites 
would  preclude  PY-STAT1  from  binding  to  importin  ot  in  the  vicin¬ 
ity  of  residues  425-538,  which  is  the  section  of  importin  ot  known 
to  be  minimally  required  for  PY-STAT1  complexation  (Reid  et  al., 
2007).  For  either  of  these  scenarios  to  occur,  the  binding  affinity 
of  monomeric  or  multimeric  VP24  to  importin  a  must  be  fairly 
strong.  For  reference,  the  binding  affinity  of  exportin  to  importin 
is  ~i  nm  (Kutay  et  al.,  1997),  where  RanGTP  is  a  necessary  compo¬ 
nent  for  complexation.  It  is  not  possible  to  deduce  specific  pairwise 
residue  contacts  between  VP24  and  importin  because  the  CE  struc¬ 
tural  alignments  of  our  VP24  model  with  the  helical  repeats  of  the 
exportin  structure  (Fig.  5)  yield  a  sequence  identity  of  7.4%  as  seen 
in  Fig.  6.  Nonetheless,  the  secondary  structural  elements  of  the  two 
structures  align  fairly  well,  further  validating  the  fold  similarity  be¬ 
tween  our  VP24  model  and  exportin.  Further  complicating  detailed 
binding  pattern  assignments  is  the  fact  that  the  exportin/importin 
structure  is  the  yeast  variant  and  not  the  filovirus-relevant  human 
version. 

While  our  prediction  suggesting  that  VP24  has  ancestral  links  to 
the  Armadillo  repeat  family  is  appealing,  our  model  also  shows 
divergence  from  this  superfamily  by  containing  a  p-sheet  arrange¬ 
ment  packed  against  the  helical  bundle.  The  prediction  accuracy  of 
a  p-sheet  is  strong  among  the  top-scoring  models  (Fig.  4)  and  may 
highlight  the  multi-functional  nature  of  VP24.  Because  of  the  lack 
of  this  secondary  structure  in  exportin,  the  question  becomes  is 
there  any  functional  role  of  the  p-sheet  in  VP24?  One  possibility 
is  stabilization  of  the  VP24  monomer  in  aqueous  solution,  where 
apolar  residues  of  the  p-sheet  shield  the  hydrophobic  core  from 
the  aqueous  environment  while  the  hydrophilic  residues  are  posi¬ 
tioned  toward  the  solvent.  Many  of  these  residues  are  conserved 
between  Ebola  and  Marburg.  Among  the  top-scoring  models,  the 
p-sheet  is  connected  to  the  helical  core  by  a  common  “hinge”  seg¬ 
ment  of  a  polar  residue  and  a  glycine.  When  submerged  in  the  viral 
lipid  matrix,  the  p-sheet  may  disconnect  from  the  hydrophobic 
face  of  the  now-transmembrane  helices  (predicted  in  Table  1)  as 
exposure  of  this  surface  becomes  favored.  In  addition,  this  hinge 
may  be  active  upon  binding  exportin  (Fig.  5)  as  well  as  the  self-or¬ 
dered  polymerization  of  VP24  in  aqueous  solvent. 
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Overall  our  model  of  the  VP24  protein  provides  testable 
hypotheses  for  additional  experimental  work.  Using  mutagenesis, 
the  docking  sites  on  importin  a  that  are  predicted  to  overlap  be¬ 
tween  VP24  and  exportin  can  be  examined  to  validate  our  bind¬ 
ing  model.  Mutation  studies  should  also  include  deleting  the 
auto-inhibitory  NLS  segment  of  importin  a  to  determine  whether 
this  affects  binding  of  VP24.  In  a  similar  fashion,  the  [3-sheet  re¬ 
gion  suggests  a  new  target  to  explore  protein  stability.  Finally, 
our  VP24  structural  model  and  the  fold  similarity  with  exportin 
may  prove  to  be  helpful  in  the  design  of  modifications  that  over¬ 
come  the  obstacles  to  the  crystallographic  determination  of  this 
viral  protein  critical  to  understanding  Ebola  virus  infections. 
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